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Abstract —We investigate the performance of distributed least- 
mean square (LMS) algorithms for parameter estimation over 
sensor networks where the regression data of each node are 
corrupted by white measurement noise. Under this condition, 
we show that the estimates produced by distributed LMS 
algorithms will be biased if the regression noise is excluded 
from consideration. We propose a bias-elimination technique 
and develop a novel class of diffusion LMS algorithms that can 
mitigate the effect of regression noise and obtain an unbiased 
estimate of the unknown parameter vector over the network. 
In our development, we first assume that the variances of 
the regression noises are known a-priori. Later, we relax this 
assumption by estimating these variances in real-time. We analyze 
the stability and convergence of the proposed algorithms and 
derive closed-form expressions to characterize their mean-square 
error performance in transient and steady-state regimes. We 
further provide computer experiment results that illustrate the 
efficiency of the proposed algorithms and support the analytical 
findings. 

Index Terms —diffusion adaptation, bias-compensated LMS, 
distributed parameter estimation, network optimization 

I. Introduction 

O NE of the critical issues encountered in distributed 
parameter estimation over sensor networks is the dis¬ 
tortion of the collected regression data by noise, which occurs 
when the local copy of the underlying system input signal 
at each node is corrupted by various sources of impairments 
such as measurement or quantization noise. This problem 
has been extensively investigated for the case of single-node 
processing devices [2]-[17]. These studies have shown that 
if the deleterious effect of the input noise is not taken into 
account, the parameter estimates so obtained will be inaccurate 
and biased. Various practical solutions have been suggested 
to mitigate the effect of the input measurement noise or to 
remove the bias from the resulting estimates [5]-[17]. These 
solutions, however, may no longer leads to optimal results in 
sensor networks with decentralized processing structure where 
the data measurement and parameter estimation are performed 
at multiple processing nodes in parallel and with cooperation. 

For networking applications, a distributed total-least-squares 
(DTLS) algorithm has been proposed that is developed using 
semidefinite relaxation and convex semidefinite programming 
[18]. This algorithm mitigates the effect of white input noise 
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by running a local TLS algorithm at each sensor node and 
exchanging the locally estimated parameters between the 
nodes for further refinement. The DTLS algorithm computes 
the eigendecomposition of an augmented covariance matrix at 
every iteration for all nodes in the network, and is therefore 
mainly suitable for applications involving nodes with powerful 
processing abilities. In a follow up paper, the same authors 
proposed a low-complexity DTLS algorithm [19] that uses an 
inverse power iteration technique to reduce the computational 
complexity of the DTLS while demanding lower communica¬ 
tion power. 

In recent years, several classes of distributed adaptive 
algorithms for parameter estimation over networks have been 
proposed, including incremental [20]-[23], consensus [24]- 
[27], [27]-[30], and diffusion algorithms [31]-[42]. Incremen¬ 
tal techniques require the definition of a cyclic path over the 
nodes, which is generally an NP-hard problem; these tech¬ 
niques are also sensitive to link failures. Consensus techniques 
require doubly-stochastic combination policies and, when used 
in the context of adaptation with constant step-sizes, can lead 
to unstable behavior even if all individual nodes can solve the 
inference task in a stable manner [38]. In this work, we focus 
on diffusion strategies because they have been shown to be 
more robust and to lead to a stable behavior regardless of the 
underlying topology, even when some of the underlying nodes 
are unstable [38]. 

A bias-compensated diffusion-based recursive least-squares 
(RLS) algorithm has been developed in [43] that can obtain 
unbiased estimates of the unknown system parameters over 
sensor networks, where the regression data are distorted by 
colored noise. While this algorithm offers fast convergence 
speed, its high computational complexity and numerical insta¬ 
bility may be a hindrance in some applications. In contrast, the 
diffusion LMS algorithms are characterized by low complexity 
and numerical stability. Motivated by these features, in this 
paper, we investigate the performance of standard diffusion 
LMS algorithms [31]-[33] over sensor networks where the 
input regression data are corrupted by additive white noise. To 
overcome the limitations of these algorithms, as exposed by 
our analysis under this scenario, we then propose an alternative 
problem formulation that leads to a novel class of diffusion 
LMS algorithms, which we call bias-compensated diffusion 
strategies. 

More specifically, we first show that in the presence of 
noisy input data, the parameter estimates produced by standard 
diffusion LMS algorithms are biased. We then reformulate this 
estimation problem in terms of an alternative cost function 
and develop bias-compensated diffusion LMS strategies that 


can produce unbiased estimates of the system parameters. The 
development of these algorithms relies on a bias-elimination 
strategy that assumes prior knowledge about the regression 
noise variances over the network. The analysis results show 
that if the step-sizes are within a given range, the algorithms 
will be stable in the mean and mean-square sense and the 
estimated parameters will converge to their true values. Finally, 
we relax the known variance assumption by incorporating a 
recursive approach into the algorithm to estimate the variances 
in real-time. 

In summary, the contributions of this article are; a) per¬ 
formance evaluation of standard diffusion LMS algorithms in 
networks with noisy input regression data; b) development 
of a novel class of diffusion LMS strategies that are robust 
under this condition; c) presentation of a recursive estimation 
approach to obtain the regression noise variances without 
using the second order statistics of the data; d) derivation of 
conditions under which the proposed algorithms are stable in 
the mean and mean-square sense; e) characterization of their 
mean-square deviation (MSD) and excess mean-square error 
(EMSE) in transient and steady-state regimes; and f) validation 
of theoretical hndings through numerical simulations of newly 
proposed algorithms for parameter estimation over sensor 
networks. 

The remainder of the paper is organized as follows. In the 
next section, we formulate the problem and discuss the effects 
of input measurement noise on the performance of diffusion 
LMS over sensor networks. In Section III, we propose bias- 
compensated diffusion LMS algorithms along with a recursive 
estimation of the regression noise variance. In Section IV, 
we analyze the stability and convergence behavior of the 
developed algorithms, and obtain conditions under which the 
algorithms are stable in the mean and mean-square sense. We 
present the computer experiment results in Section VI, and 
conclude the paper in Section VII. 

Notation: Matrices are represented by uppercase fonts, 
vectors by lowercase fonts. Boldface letters are reserved for 
random variables and normal letters are used for determinis¬ 
tic variables. Superscripts (•)^ and (•)*, respectively, denote 
transposition and conjugate transposition. Symbols Tr(-) and 
p{-) denote the trace and spectral radius of their matrix argu¬ 
ment. The operator E[-] stands for statistical expectation, and 
Afe(-) denotes the A:-th eigenvalue of its matrix argument. The 
Kronecker product is denoted by and the block Kronecker 
products [44] is denoted by (g)^. The operator diag{-} converts 
its argument list into a (block) diagonal matrix. The operator 
col{-} performs a vertical stacking of its arguments while 
vec( ) is the standard vectorization for matrices. The symbol 
bvec(-) is the block vectorization operator that transforms a 
block-partitioned matrix into a column vector [44]. 

11. Problem Statement 

Consider a collection of N sensor nodes distributed over 
a geographical area and used to monitor a physical phe¬ 
nomenon characterized by some unknown parameter vector 
w° G illustrated in Eig. 1, at discrete-time i G N, 

each node k G {1,2,•••,7V} collects noisy samples of 


the system input and output denoted by G and 

dk(i) G C, respectively. These measurement samples can be 
expressed as: 

2 ( 1 ) 

dkii) = Uk,iW° + Vk{i) ( 2 ) 

where Uk^i G nk,i G and Vk(i) G C, 

respectively, denote the regression data vector, the input mea¬ 
surement noise vector, and the output measurement noise*. 

Assumption 1. The random variables in data model (l)-(2) 
satisfy the following conditions: 

a) The regression data vectors are independent and iden¬ 

tically distributed (i.i.d.) over time and independent 
over space, with zero-mean and covariance matrix 

Ru,k = > 0. 

b) The regression noise vectors nk,i are Gaussian, i.i.d. over 

time and independent over space, with zero-mean and 
covariance matrix Rn,k = i'>T'k,i] = 

c) The output noise samples Vk(i) are i.i.d. over time and 
independent over space, with zero-mean and variance 
<k- 

d) The random variables u^y, nij and Vp{m) are indepen¬ 
dent for all k,£,p,i,j, and m. 


physical parameter 



Eig. 1: Measurement model for node k. 


The linear model (l)-(2) differs from those used in previous 
works on distributed estimation, such as [23], [31], [33]. In 
these references, it is assumed that the actual regression vector 
Uk.i is available at each node k. There are many practical 
situations, however, where the nodes only have access to noisy 
measurements of the regression data. We use relation (1) to 
model such disturbance in the regressors, and to investigate the 
effect of the noise process ^ on the distributed estimation 
of w°. To better understand the effect of this noise, we hrst 
examine the behavior of a centralized estimation solution 
under this condition and then explain how the resulting effect 
carries over to distributed approaches. 

In centralized estimation, nodes transmit their measurement 
data {zk,i,dk{'i)}^=i to a central processing unit. In the 
absence of measurement noise, i.e., Uk.i = 0, the central pro¬ 
cessor can estimate the unknown parameter vector w° by, e.g., 
minimizing the following mean-square error (MSE) function 

*We use parentheses to refer to the time indices of scalar variables, such 
as dk{i), and subscripts to refer to time indices of vector variables, such as 

^k,i- 













[45]: 


N 

Ju{w) = ^E\dk{i) - Uk^iw\^. ( 3 ) 

k^l 

Let us introduce rdu,k — E[dk{i)ul. ■] and denote the sums of 
covariance matrices and cross-covariance vectors over the set 
of nodes by: 

N N 

Ru — ^ ^ Ru,k-) du — ^ ^ ^du,k- ( 4 ) 

k^l k^l 

It can be verified that under Assumption 1, the solution of (3) 
is: 

w° = R~^rdu- ( 5 ) 

Now consider the recovery of the unknown parameter vector 
w° for the noisy regression system described by (1) and (2). 
Since the regression noise ^ is independent of Uk^ and 
dk{i), we have 

= E[zl^Zk,i] = Ru,k + o-n^k^ ( 6 ) 

rdz,k = E[dk(i)z*k,^] = rdu,k- ( 7 ) 

Considering these relations and now minimizing the global 
MSE function 

N 

Jz{w) = ^E\dkii) - (8) 

k^l 

with Ukd in (3) replaced by Zk^ in (8), we arrive at the biased 
solution 

vJ" = {Ru + all) Vd„ (9) 

where 

N 

( 10 ) 

fc=i 

Let us define the bias implicit in solution (9) as 
b = w° — w^. To evaluate &, we may use the following identity, 
which holds for square matrices Xi and X 2 provided that Xi 
and Xi + X 2 are both invertible [ 46 ]: 

{Xi + X2)-^ = Xf 1 - (/ + X^^X2)-^X^^X2X^\ (11) 

Here i?„ and (i?„ + al I) are invertible, and therefore, we 
obtain: 

{Ru + al /)■' = R-^ - alii + alR-^^RZ^. ( 12 ) 

Considering this expression and relation (9), the bias resulting 
from the minimum MSE estimation at the fusion center can 
be expressed as: 

b = aliI + alR-Y"R-^w°. (13) 

In the absence of regressor noise, it has been shown in 
previous studies that the parameter estimates obtained from 
standard diffusion LMS strategies approach the minimizer of 
the network global MSE function [33]. This also holds in noisy 
regression applications for diffusion LMS developed based on 
the global cost (8), meaning that the estimates generated by 
standard diffusion LMS algorithms will eventually approach 
(9). As shown by (13), this solution is biased and deviates 


from the optimal estimate by b. This issue will become more 
explicit in our convergence analysis in Section IV-A. 

In sequel, we explain how by forming a suitable objective 
function, the bias can be compensated in both centralized and 
distributed LMS implementations. 

III. Bias-Compensated LMS Algorithms 

In our development, we initially assume that the regression 
noise variances, {al k}^=i^ ^6 known a-priori. We later 
remove this assumption by estimating these variances in real¬ 
time. In networks with centralized signal processing structure, 
one way to obtain the unbiased optimal solution (5) is to search 
for a global cost function whose gradient vector is identical to 
that of cost (3). It is straightforward to verify that the following 
global cost function satisfies this requirement: 

N N 

“ ZkdW\^^ - ( 14 ) 

fc=l k=l 

Remark 1. In bias-compensation techniques for single-node 
adaptive algorithms, including [12], [13], [16], the authors 
first apply a least squares (LS) or minimum MSE procedure 
to obtain an estimate of the unknown parameter vector. The 
resulting estimate consists of the desired solution along with 
an additive bias term. The bias, which is normally expressed 
in terms of the second order statistics of the regression data 
and the input and output measurement noises, is removed from 
the solution by subtraction. In the proposed technique in this 
paper, we start by considering bias removal one step earlier, 
meaning that we design a convex objective function such that 
its unique stationary point leads to an unbiased estimate. 
From this respect, our approach is mostly inspired from the 
derivation of the modified LMS and RLS algorithms in [8], 
[15]. However, these algorithms still assume the knowledge 
of the ratio of input-to-output noise variances in their update 
equations. 

The derivation of distributed algorithms will be made easier 
if we can decouple the network global cost function and write 
it as sum of local cost functions that are formed using the 
local data. The global cost (14) already has such a desired 
form. Lor this to become more explicit, we express (14) as: 

N 

J{w)='^Jk{w) (15) 

k=l 

where Jk{w) is the cost function associated with node k and 
is given in terms of local data dk{i) Zk,i, i.e., 

Jk{w) = E\dk{i) - Zk,^w\‘^ - crl kWwW'^. (16) 

Remark 2. Under Assumption 1, the Hessian matrix of (16) is 
positive definite, i.e., S7lJk{w) > 0, hence, J(w) is strongly 
convex [47]. 

Below, we first comment on the centralized LMS algorithm 
that solves (14), and then elaborate on how to develop the 
unbiased distributed counterparts. 



A. Bias-Compensated Centralized LMS Algorithm 

To minimize (15) iteratively, a centralized steepest descent 
algorithm [45] can be implemented as: 


N 

Wi = Wi^i - /i|^^ VJfe(i(?i_i) 

fc=i 


(17) 


where /i > 0 is the step-size, and VJk{w) is a row vector 
representing the gradient of Jk with respect to the vector w. 
Computing the gradient vectors from (16) leads to: 

N 

Wi = Wi^i + ^ (rdz,k - Rz,k'Wi-l + (Jn,k^i-l^ • (18) 

k^l 

In practice, the moments Rz,k and rdz,k are usually unavail¬ 
able. We, therefore, replace these moments by their instan¬ 
taneous approximations ^Zk,i and zl^dk{i), respectively, 
and obtain the bias-compensated centralized LMS algorithm: 


Wi = + mX! “ Zk,iW,_i] -f al kWt-iY 

fe=i 

(19) 

In Section V, we propose an adaptive scheme to estimate the 
variances of the regression noise required in the above central¬ 
ized LMS algorithm as well as in its distributed counterpart 
derived below. 


B. Bias-Compensated Diffusion LMS Strategies 

There exist different distributed optimization techniques 
that can be applied on (14) to find w° [31], [33], [48]. We 
concentrate on diffusion strategies [31], [33] because they 
endow the network with real-time adaptation and learning 
abilities. In particular, diffusion optimization strategies lead to 
distributed algorithms that can estimate the parameter vector 
w° and track its changes over time [31], [33], [37], [49]. 
Here, we briefly explain how diffusion LMS algorithms can 
be developed for parameter estimation in systems with noisy 
regression data. The main step in the development of these 
algorithms is to reformulate the global cost (14) and represent 
it as a group of optimization problems of the form: 

nuni ^ ct^k(^\di{i) - zt^iw\^ - 
™ I t&N'k 

+ X! 

e-<zMk\{k} ) 

( 20 ) 

where A4 is the set of nodes with which node k shares 
information, including node k itself. The nonnegative scalars 
{ci^k} are the entries of a right-stochastic matrix C G 
which satisfy 

N 

C£,k = 0 if ^ ^ A4, and ^ = 1. (21) 

k^l 

The scalars {bi k} are scaling coefficients that will end up 
being incorporated into the combination coefficients {a^ fc} 
that appear in the final statement (23) of the algorithm below. 
The first term in the objective function (20) is the modified 


mean-squared function incorporating the noise variances of 
neighboring nodes £ G Afk- This part of the objective is based 
on the same strategy as in the above centralized objective 
function for bias removal. The second term in (20) is in fact a 
constraint that forces the estimate of the node k to be aligned 
with the true parameter vector w°. Since w° is not known 
initially, it will be alternatively substituted by an appropriate 
vector during the optimization process. One can use the cost 
function (20) and follow similar arguments to those used in 
[33], [37], [49] to arrive at the bias-compensated adapt-then- 
combine (ATC) LMS strategy (Algorithm 1). Due to space 
limitations, these steps are omitted. 


Algorithm 1 : ATC Bias-Compensated Diffusion LMS 


'^k,i — 1 Bk ^ ^ 1 )] ( 22 ) 

ieMk 

Wk,i = ^ ae^ki’i^i (23) 

ieMk 


In this algorithm, /i^ > 0 is the step-size at node k, the vectors 
ipk and Wk,i are the intermediate estimates of w° at node k, 
and the stochastic gradient vector is computed as: 

[y Je{wk,i-i)] * = - [zli{de{i) - zi^iWk^i-i) 

+ al_^f^Wk,i-i\■ (24) 


which is an instantaneous approximation to gradient of (16). 
Moreover, the nonnegative coefficients ae^k are the elements 
of a left-stochastic matrix A G satisfying 

ae^k = 0 if f ^ A4, and ^ ae^k = 1- (25) 

i&Mk 


To run the algorithm, we only need to select the coefficients 
{cr^fc, which can be computed based on any combination 
rules that satisfy (21) and (25). One choice to compute the 
entries of matrix A is: 


at,k = 


j-2 


E. 


and ak,k = 1 - 


iaN'k 


ae,k- 

eeJGk\k 


(26) 


This rule implies that the entry ae^k is inversely proportional 
to the regressor noise variance of node i. Other left-stochastic 
choices for A are possible, including those that take into 
account both the noise variances and the degree of connectivity 
of the nodes [39]. 

By reversing the order of the adaptation and combination 
steps in Algorithm 1, we can obtain the following combine- 
then-adapt (CTA) diffusion strategy. As we will show in 


Algorithm 2 : CTA Bias-Compensated Diffusion LMS 


i>k,i-i = X! (^LkWe,i-i (27) 

Wk,z = -fAfe,*-! - BkYl ['^JeitPk,i-i)] * (28) 

l&.N'k 



the analysis, the proposed ATC and CTA bias-compensated 
diffusion-LMS, in average, will converge to the unbiased 
solution (5) even when the regression data are corrupted by 
noise. In comparison, the estimate of the previous diffusion 
LMS strategies such as one proposed in [33] will be biased 
under such condition. 

Remark 3. In the proposed ATC algorithm, each node k re¬ 
ceives f} from its neighbors in the adaptation 

step, and in the combination step, where I G A4- In total, 
it will receive (2M + 2)|A4| scalar data from its neighbors. 
To reduce the communication overhead of the network, one 
solution is to choose C = I. Doing so, we can reduce the 
amount of exchanged data at each node k to M|A4| while 
maintaining almost similar performance results, as evidenced 
in Section VI. Note that the amount of information exchange 
in this case will be equal to that of the standard ATC diffusion 
LMS in [33]. This conclusion is also valid for the proposed 
CTA Algorithm 2. 

IV. Performance Analysis 
In this section, we analyze the convergence and stability of 
the proposed ATC and CTA bias-compensated diffusion LMS 
algorithms by viewing them as special cases of a more general 
diffusion algorithm of the form: 

(29) 

e&Nk 

■iPk.i = ce,k * ( 30 ) 

teNk 

Wk,i = (31) 

teNk 

where and {a^^l} are non-negative real coefficients 

corresponding to the {£, fc)-th entries of left-stochastic matrices 
Ai and A 2 , respectively, which have the same properties as 
A. Different choices for Ai and A 2 corresponds to different 
operation modes. For instance, Ai = I and A 2 = A 
correspond to ATC whereas Ai = A and A 2 = I generate 
CTA. For mathematical tractability, in our analysis, we assume 
that the variances of the regression noises, i.e., over the 
network are known a-priori. 

We define the local weight-error vectors as Wk,i = w° — 
Wk,t, ipk.i =w° - t/’fc.* and 0^, ^ = w° - and form the 
global weight-error vectors, by stacking the local error vectors, 
i.e.: 


4 ,^ =COl{0l_^,02.^,•■•,0A.^} 


(32) 



(33) 

Wi = col{mi_i, W2,i, ..., WN,i}. 


(34) 

; also define the block variables: 

Oi = C'^col{zliVi{i),.. .,z%^iVN{i)} 


(35) 

TZi = diagj ^ ce,k {zfiZe^i - a'i^il), fc = 1, • • 

■,n} 

(36) 

Vi = diagj ^ ce,k (zfinpi - crl^fr), fc = 1, • • 

■mN} 

(37) 

ieMk 

M = diag{pi7M, • • • ,PnIm} 


(38) 


and introduce the following extended combination matrices: 

Ai = Ai 0 Im, A 2 = A 2 0 I Ml C = C 0 Im- (39) 

Using these definitions and update equations (29)-(31), it can 
be verified that the following relations hold: 

^i — l — Ai Wi—I 

Wi = Al ipi (40) 

where u;° = 1 0 ti;®. From the set of equations given in (40), 
it is deduced that the network error vector Wi evolves with 
time according to the recursion: 

Wi = BiWi-i — A 2 MgiA 2 M'PiUj° (41) 

where the time-varying matrix Bi is defined as: 

B, =A'^iI- Mn,)Al. (42) 


A. Mean Convergence and Stability 

Tacking the expectation of both sides of (41) and consider¬ 
ing Assumption 1, we arrive at: 

E[i3?i] = (43) 

where in this relation: 

B = E[Bi] =Al{I- MTi)Ai^ (44) 

= E[7?.i] = diagj ^ ce^k Ru,e, fc = 1, • • • , vj. (45) 
texfk 


To obtain (43), we used the fact that E[A 2 ^Adgi] = 0 because 
Vk,i is independent of Zk,i and E[r;fc(i)] = 0. Moreover, we 
have E[Pj] = 0 because E[z| = cr^ ^I. According to 
(43), limi_i.oo E||ti’i|| —>■ 0 if is stable (i.e., when p{B) < 1). 
In fact, because p{Ai) = p{A 2 ) = 1 and IZ> Q choosing the 
step-sizes according to: 


Q< Pk< 


2 

p( Tht^Mk ^^,kRu,t) 


(46) 


guarantees p{B) < 1. We omit the proof. The similar ar¬ 
gument can be found in [49] and [35]. We summarize the 
mean-convergence results of the proposed bias-compensated 
diffusion LMS in the following. 


Theorem 1. Consider an adaptive network that operates using 
diffusion Algorithms 1 or 2 with the space-time data (1) 
and (2). In this network, if we assume that the regressors 
noise variances are known or perfectly estimated, the mean 
error vector evolves with time according to (43). Furthermore, 
Algorithms 1 and 2 will be asymptotically unbiased and stable 
provided that the step-sizes satisfy (46). 


Remark 4. In networks with noisy regression data (1), the 
estimates generated by the previous diffusion LMS strategies 
such as the ones proposed in [33], [49] are biased, i.e, 
E[i(?i] f Q as i ^ 00 . This can be readily shown if we remove 
fc from (36) and (37). In this scenario, (43) will be stable 

‘f 


p{^l^Mk ^(,k{Ru,i + Cf’^ iIm)'^ 


0 < Pk < 


(47) 



Then, for sufficiently small step-sizes, satisfying (47), it can 
be verified that the estimate of the standard diffusion LMS 
deviates from the network optimal solution u)° by: 

lim E[wi] = (Inm - (48) 

i—foci 

where 

B' 4 (Inm - Mn')Al (49) 

^ diag{ ^ , fc = 1, • • • , iv} (50) 

V ^ diag{ ^ fc = 1, • • • , iv}. (51) 

As it is clear from (48), the bias is created by the regression 
noise {nk,i} only, whereas the noise {vk{i)} has no effect on 
generating the bias. 

B. Mean-Square Convergence and Stability 

To study the mean-square performance of the proposed 
algorithms, we first follow the energy conservation arguments 
of [33], [45] and determine a variance relation that is suitable 
in the current context. The relation can be obtained in the 
limit, as i —oo, by computing the expectation of the weighted 
squared norm of (41) under Assumption 1: 

E||*,||| =e(||*,_i|||,) +E[g*MA2EA^Mg,] 

+ E[u°*T*MA2EA'^MV^0J°] (52) 

where ||a;|||. = x*Ex and S > 0 is a weighting matrix that we 
are free to choose. Note that (52) is obtained by eliminating 
the following terms: 

E[{A^Mg,)*EA'^(I-Mn^)Ai^Ew,_i] = 0 (53) 

E[{Al (/ - Mn,)Ai'^w,_i)*EAlMgi] = 0 (54) 

E[{AlMV^w°)*EAl(I - Mn^)Al'^w^-l] = 0 (55) 

E[{Al(I - Mn,)Ai'^w,_l)*EAlMV^w°] = 0. (56) 

These terms are zero firstly, because Wi-i is independent of 
gTj, "Pi and Pi under Assumption 1 [50], and secondly, since 
the proposed algorithms are unbiased, E[fhi] is zero for large 
i, if the step-sizes are chosen as in (46). 

In relation (52), we have: 

= (57) 

It follows from Assumption 1 that and Pi are indepen¬ 
dent of each other so that 

E(||fh,_i|||,) =E||*,_i||2j^,]. (58) 

Substituting this expression into (52), we arrive at: 

E||u;,|l| = E||-ii;,_i|||, + Tv[EAlMgMA2\ 

+ TT[EAlMnMA2\ (59) 

where 

E'= E[B*EBi]. (60) 

In equation (59) Q = E[gig*], which using (35) is given by 

(see Appendix A): 


Q — C^diag|(j^ -I- cr^ j/),. . . , a^ ff{Ru,N -f cr^^jv/)|c. 

( 61 ) 

In relation (59), 11 = E\PiUj°uj°*P*i\ and its (A:,j)-th block 
is computed as (see Appendix B): 

nfc.j = + 

t 

+ (p - (62) 

where /3 = 2 for real-valued data and /3 = 1 for complex¬ 
valued data. If we introduce a = bvec(E) and a' = bvec(S') 
then we can write a' = Fa where 

F = E[Bj 06 B*] (63) 

Considering these definitions, the variance relation in (59) can 
be rewritten more compactly as: 

E||i3j,|| 2 =E||*,_i||^^-f 7^CT (64) 

where we are using the notation ||a;|l^ as a short form for 
bill, and where 

7 = hvec(A 2 MG^MA 2 + (65) 

To compute F, we expand E' from (60) to get: 

E' =Ai(^A 2 EA^ - PMA 2 EA'^ - A 2 EAlM'R^Al 

+ E[AlP*MA2EAlMP^Al]. ( 66 ) 

The last term in (66) depends on and can, therefore, be 
neglected for small step-sizes. As a result, we obtain 

F « (Ai ®i, Ai)(I - I <»b FM - fFm ®i, I)(A 2 ®b A 2 ). 

(67) 

We can also derive a more compact expression to compute 
F. To this end, we first note that the last term in (66) can be 
expressed as: 

E[A^P*MA2EAlMP^Al] = E[AiP* MA 2 EAIMPA^] 

+ 0 (M'^) (68) 

Now by substituting (68) into (66) and ignoring the remaining 
terms that depend on under the small step-size condition, 
we arrive at: 

®b B* (69) 

We now proceed to show the stability of the algorithm in 
the mean-square error sense, as follows. Using (64), we can 
write: 

00 

limE||'U?ib= lim Ellth.illb+v 

j=0 

As it is evident from this expression, the proposed algorithms 
will be stable in the mean-square sense if F is stable. From 
(69), we deduce that F will be stable if B is stable. According 
to our mean-convergence analysis, the stability of B is guar¬ 
anteed if (46) holds. Therefore, the step-size condition (46) is 
sufficient to guarantee the stability of the algorithms both in 
the mean and mean-square sense. 



C. Mean-Square Steady-State Performance 

To obtain mean-square error (MSB) steady state expressions 
for the network, we let i go to infinity and use expression (64) 
to write: 


lim = 7 '^CT. (71) 

By definition, the MSD and EMSE at each node k are 
respectively computed as: 

? 7 fc = lim = lim E||u;fc_,|||j (72) 

i—¥co i—¥oo ’ 


The MSD and EMSE of the nodes can be retrieved from the 
network error vector Wi by writing: 


r;,=^lunE||ri,,||Jdiag(,,)^,} 

a=/_imE||ri,,||Jdiag(,,)^«„^,} 


(73) 

(74) 


where Ck is a canonical basis vector in K.^ with entry one at 
position k. Erom (71) and (73), we can obtain the MSD at 
node k, for k G {1, 2, • • • , N}: 


Vk = 7^(7 - -T") ^bvec(diag(efc) (g) Im)- (75) 


In the same manner, we compute the EMSE at node k as: 

Cfc = 7^(7 - J')”^bvec(diag(efc) (g) Ru,k) ■ (76) 

The network MSD and EMSE are defined as the average of 
MSD and EMSE values over the network, i.e., 

1 AT ^ N 

(77) 

fc=i fe=i 


D. Mean-Square Transient Behavior 

We use (64) to obtain an expression for the mean-square 
behavior of the algorithm in transient-state. In this expression, 
if we substitute Wk,-i = 0, Vfc € {1, • • • , N}, we obtain: 

i 

\\w^\\l = \\w°\\‘^J,i+^^ + y^'^R^a. (78) 

j=o 

Writing this recursion for i — 1, and subtract it from (78) leads 
to: 

11*,11^ = ||*,_,||2 + (79) 

By replacing a with (Tmsdfc = bvec(diag{efc} (g) Im) and 
CTemsefc = bvec(diag{efc} (g) Ru,k) and using Wk-i = 0 , we 
arrive at the following two recursions for the evolution of MSD 
and EMSE over time: 


Vk{i) = Vk{i - 1) - -f (80) 

Ck{i) = Ck{i - 1 ) - l|w’°II.F<-l(/-.F)ae„se;, + ^CTemse;, • 

(81) 

The MSD and EMSE of the network can be computed either 
by averaging the nodes transient behavior, or by substituting 

Cmsd = ^bvec(/Mtv) ( 82 ) 

CTemse = ^bvec(diag{i?„,i, • • • , Ru,n}) (83) 


in recursion (79). We summarize the mean-square analysis 
results of the algorithms in the following: 

Theorem 2. Consider an adaptive network operating under 
bias-compensated diffusion Algorithm 1 or 2 with the space- 
time data (1) and (2) that satisfy Assumption 1. In this network, 
if we assume that the regressors noise variances are known or 
perfectly estimated and nodes initialize at zero, then the MSD 
and EMSE of each node k evolve with time according to (80) 
and (81) and the network MSD and EMSE follow recursions: 

p(i) = ri(i - 1) - + 7^-7^Vmsd 

C(*) = C(* - 1) - lk°II.F*-i(/-.?')<Tem.e + 

where (Jmsd, ond (Jemse are defined in (82) and (83) and T is 
given by (63). Moreover, if the step-sizes are chosen to satisfy 
(46), the network will be stable, converge in the mean and 
mean-square sense and reach the steady-state MSD and EMSE 
characterized by (77). 

V. Regression Noise Variance Estimation 
In the proposed algorithms, each node still needs to have 
the regression noise variances, {<Jn evaluate the 

stochastic gradient vector, V Jg,. In practice, such information 
is rarely available and normally obtained through estimation. 
A review of previous works reveals that the regression noise 
variances can be either estimated off-line [43], or in real-time 
when the unknown parameter vector, w°, is being estimated 
[51], [52]. Eor example, in the context of speech analysis, 
they can be estimated off-line during silent periods in between 
words and sentences [43]. In some other applications, these 
variances are estimated during the operation of the algorithm 
using the second-order moments of the regression data and the 
system output signal [51], [52]. In what follows we propose 
an adaptive recursive approach to estimate the regression noise 
variances without using the second order moments of the data. 

The variance of the regression noise at each node is classi¬ 
fied as local information and, hence, it can be estimated from 
the node’s local data. When the regression data at node k is not 
corrupted by measurement noise (i.e., Zk,i = Uk,i), and when 
the node operates independent of all other nodes to estimate 
w° by minimizing E|dfe(r) — the minimum attainable 

MSB can be expressed as [45]: 

>7inin = ^d,k ~ '^du.k^u.k^d.u.k- (84) 

Under noisy regression scenarios where node k operates inde¬ 
pendently to minimize the cost (16), the minimum achievable 
cost will still be (84). To verify this, we note from Remark 
2 that since Jk{w) is positive definite and, hence, strongly 
convex, its unique minimizer under Assumption 1 will be w°. 
Therefore, substituting w° into (16) will give its minimum, 
i.e.: 

min = E\dk{i) - Zk,iW°\‘^ - al fe||'w°|P 

W ’ 

= ^d,k ~ '^du.k^u.k'^'du.k (85) 

= Jmin- (86) 

We use this result to estimate the regression noise variance 
f. at each node k. 



TABLE I: Network signal and noise power profile 



X 

Fig. 2; Network topology used in the simulations. 


Now, let us introduce 


efc(i) = dk{i) - Zk,iWk^^-l (87) 

where Wk^i-i is the weight estimate from ATC diffusion 
(which would be replaced by for CTA diffusion). 

Considering Jk(wk,i-i), for sufficiently small step-sizes and 
in the limit when the weight estimate is close enough to w°, 
it holds that: 



Parameters 

Node k 

—2 

^v.k 

Tr(i?„,j,) 

—2 

^n.k 

1 

0.0230 

0.3000 

0.0170 

2 

0.0020 

0.7500 

0.0970 

3 

0.0160 

0.5250 

0.0620 

4 

0.0040 

0.4250 

0.0570 

5 

0.0420 

0.6000 

0.0600 

6 

0.0400 

0.6500 

0.0730 

7 

0.0120 

1.0000 

0.0560 

8 

0.0120 

0.7750 

0.0860 

9 

0.0310 

0.7250 

0.0250 

10 

0.0280 

0.6750 

0.0490 

11 

0.0350 

0.6500 

0.0680 

12 

0.0500 

0.6000 

0.0760 

13 

0.0090 

0.2750 

0.0600 

14 

0.0340 

0.3500 

0.0150 

15 

0.0290 

0.6250 

0.0160 

16 

0.0280 

0.9250 

0.0490 

17 

0.0020 

0.3250 

0.0830 

18 

0.0080 

0.8750 

0.0370 

19 

0.0410 

0.2500 

0.0170 

20 

0.0460 

0.8000 

0.0160 


VI. Simulation Results 


nek{i)\^ - ^ Jmin. ( 88 ) 

From (2) and (84), it can be verified that Jmin = <^1 and 
hence from ( 88 ), we can write: 


E|efe(i)|" 


+ 


(89) 


In this relation, cr^ f,, can be ignored if a, 


n,k I 


» cr; 


v^k’ 


Under such circumstances, if we assume ||w°|P 7 ^ 0, which 
is true for systems with at least one non-zero coefficient, then 
the variance of the regression noise can be obtained by: 

E|e^(z)P 


^2 

^n.k 


nO||2 


(90) 


Since, in (90), E|e/c(i)p and the unknown parameter, w°, are 
initially unavailable, we can estimate f. using the following 
relations as the latest estimates of these quantities become 
available, i.e.. 


fk(i) = oifkii - 1 ) + (1 - a)|efe(*)P 


(91) 

(92) 


where 0 <C a < 1 is a smoothing factor with nominal values 
in the range of [0.95,0.99]. 

Assumption 2. The regression noise variance, cr^ and 
the output measurement noise, satisfy the following 

inequality 

(93) 


Under this assumption, the regressor noise variance at each 
node k can be adaptively estimated via (91) and (92) using 
the data samples ek{i) and Wk^i-i supplied from the bias- 
compensated LMS iterations. 


In this section, we present computer experiments to illustrate 
the efficiency of the proposed algorithms and to verify the 
theoretical findings. We evaluate the algorithm performance 
for known regressor noise variance and with adaptive noise 
variance estimation. We consider a connected network with 
N = 20 nodes that are positioned randomly on a unit 
square area with maximum communication distance of 0.4 unit 
length. The network topology is shown in Fig. 2. We choose 
Ai = /, compute A 2 using the relative-variance rule (26) 
and choose the matrix C according to the metropolis criterion 
[31], [49]. In the plots, we use Arei and Cmet to refer to this 
particular choice of A 2 and C. The network data are generated 
according to model (1) and (2). The aim is to estimate the 
system parameter vector = [ 1 , js/2 over the network 
using the proposed bias-compensated diffusion algorithms. In 
all our experiments, the curves from the simulation results are 
drawn from the average of 500 independent runs. 

We choose the step-sizes as pk — 0.05, and set = 

[0,0]^, for all k. We adopt Gaussian distribution to generate 
Vk{i), rik^i and Uk^i- The covariance matrices of the regression 
data and the regression noise are of the form Ru,k = k^M, 
and cr^ k^M, respectively. The network signal and noise power 
profile, are given in Table I. 

a) Transient MSE Results with Perfect Noise Variance 
Estimation: In Fig. 3, we demonstrate the network transient 
behavior in terms of MSD and EMSE for the proposed 
diffusion LMS algorithm, standard diffusion LMS algorithm 
[33] and the non-cooperative mode of the proposed algorithm. 
Note that A 2 = I and C = I correspond to the non- 
cooperative network mode of the proposed algorithm, where 
each node runs a stand alone bias-compensated LMS. As the 
results indicate, the performance of the cooperative network 
with Cmet and Arei exceeds that of the non-cooperative case by 
12 dB. We also observe that the proposed algorithm outper- 
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Fig. 3: Convergence behavior of the proposed bias-compensated 
diffusion LMS, standard diffusion LMS and non-cooperative LMS 
algorithms. 
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Fig. 5: Network steady-state MSD for different combination matri¬ 
ces. 
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Fig. 6; Network steady-state EMSE for different combination ma- 
Flg. 4; MSD learning curves of nodes 5 and 15 and EMSE learning trices 
curves of nodes 4 and 18. 


form the standard diffusion LMS [33] by more that 12dB. 
It is interesting to note that the non-cooperative algorithm 
outperforms the standard diffusion LMS by about IdB. 

We also present the EMSE and MSD of some randomly 
chosen nodes in Eig. 4. In particular, we plot the EMSE 
learning curves of nodes 4 and 18 and the MSD learning 
curves of nodes 5 and 15. We observe that the MSD curves of 
the chosen nodes are identical. Since the algorithm is unbiased, 
this implies that these nodes have reached agreement about 
the unknown network parameter, w°. As we will show in the 
steady-state results, all nodes over the network almost reach 
agreement. We note that, in all scenarios, there is a good 
agreement between simulations and the analysis results. 

b) Steady-State MSE Results with Perfect Noise Variance 
Estimation: The network steady-state MSD and EMSE are 
shown in Eigs. 5 and 6. Erom these figures, we observe that 
there is a good agreement between simulations and analytical 
findings. In addition, we consider the case when nodes only 



Fig. 7: Steady-state network EMSE with known and estimated 
regressor noise variances. 
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Fig. 8: Steady-state network MSD with known and estimated regres¬ 
sor noise variances. 



Fig. 10; The estimated and true value of the regression noise 
variance, over the network. 



Fig. 9: EMSE Tracking performance with known and estimated 
regressor noise variances. 

exchange their intermediate estimates (i.e., when C = I). 
It is seen that the MSD performance of the algorithm with 
Cmet is IdB superior than that with C = I. We also observe 
that the performance discrepancies between nodes in terms of 
MSD is less than 0.5dB for cooperative scenarios, while in 
the non-cooperative scenario it is more than 5dB. This shows 
agreement in the network in spite of different noise and energy 
profiles at each node. Note that the fluctuations in EMSE over 
the network are due to differences in energy level in the nodes’ 
input signals, but this does not preclude the cooperating nodes 
from reaching a consensus in the estimated parameters. 

c) MSE Results of the Algorithm with Adaptive Noise 
Variance Estimation: We compare the transient and steady- 
state behavior of the bias-compensated diffusion LMS with 
known regressor noise variance and adaptive noise variance 
estimation. Eor this experiment, we consider the same net¬ 
work topology and noise profile as above. However, the 
unknown parameter vector to be estimated, in this case, is 


w° — 2I5 + 2jl5, where 1 m is a M x 1 column vector 
with unit entries. The network energy profile is chosen as 
Tr(i?„ fc) = 20Tr((T^ ^/). Using these choices. Assumption 
2 will be satisfied. We set a = 0.99 and pk = 0.01 for all k. 

Eigs. 7 and 8 show the steady-state EMSE and MSD of 
the network for these two cases. The steady-state values are 
obtained by averaging over the last 200 samples after initial 
convergence. We observe that the performance of the proposed 
bias-compensated LMS algorithm with adaptive noise variance 
estimation is almost identical to that of the ideal case with 
known noise variances. 

Eig. 9 illustrates the tracking performance of the bias- 
compensated diffusion LMS algorithm for these two cases for 
a sudden change in the unknown parameter w° and compares 
the results with that of the standard diffusion LMS algorithm 
given in [33]. The variation in the unknown parameter vector 
occurs at iteration i = 550 when w° changes to 2w°. 
Similar conclusion as in Eig. 7 and 8 can be made for the 
proposed algorithms with known and estimated regression 
noise variances. We also observe that the proposed algorithms 
outperform the standard diffusion LMS [33] by nearly lOdB 
in steady-state. 

Eig. 10 illustrates the results of regression noise variance 
estimation in the steady state. In this experiment, we observe 
that for i > 350, ]E[cr^ ^(i)] —>• cr^ This indicates that the 
proposed adaptive estimation strategy for computation of the 
nodes’ regression noise variance over the network works well. 

VII. Conclusion 

We developed bias-compensated diffusion LMS strategies 
for parameter estimation over sensor networks where the re¬ 
gression data are corrupted with additive noise. The algorithms 
operate in a distributed manner and exchange data via single¬ 
hop communication to save energy and communication re¬ 
sources. The proposed algorithms estimate the regression noise 
variances and use them to remove the bias from the estimate. 
In the analysis, it has been shown that the proposed bias- 
compensated diffusion algorithms are unbiased and converge 




























in the mean and mean-square error sense for sufficiently 
small step-sizes. We carried out computer experiments that 
confirmed the effectiveness of the algorithms and support the 
analytical findings. 

Appendix A 

COMPUTATION OF Q 

This can be computed by substituting g{i) from (35) into 
g = E[g^g*] , as a result: 


g =C^E 




Z*N,VN{i) 




The {k,j)-th block of the above matrix can be computed as: 


[S]k,] = 


and (61) follows. 


CT2fc(i?„.fe + cr2 ,/), k = j 


Appendix B 

COMPUTATION OF If 


We rewrite If as: 


n = E[Vinv*] 


where O = . The (fc, j)-th block of If can be computed 

as: 

Ilfcj =E ^ ^ ^ ^ Cl^kCra,j{Zf .^ni^i — a^ fl) 

l m 

X rifej (rijyj ~ (97) 

We use (1) to replace zi^i and Zm,i' 

^ m 

X - crl^ml) (98) 

This leads to: 

i m 

+EE 

I m 

^ ^ ^ ^ £'£,kCrnJ^[£^n,£^kj'£^rn,i'^'rn,i\ (100) 

i m 

If we assume that the regression {uk^i} and noise {rik^i} 
are zero mean circular Gaussian complex-valued vectors with 
uncorrelated entries, then: 




0 £ ^ m 

^TT{Qkj)Ru,e £ = m 


I «..r2 O. ^ rrpv 


£ ^ m 


P<,i^kjcrl^m + ^^i^kjcrlil) £ = m 


and 

^[^n,l^^kj'n'm,i'k^'rn,i] = ^n,i^kj(^n,m (103) 

We note that 

nk,j=wlw°* (104) 

where w° = w^, ^ k,j G {1, 2,... N}. Therefore, 

£lik=^mn, y£,k,m,n G {1,2,-■■ ,N} (105) 

and Tr(nfej) = ||w°|p. As a result: 

nfc.i = '^ce,kCi,j\^alj,\\w°f{Ru,i + crlil) 
i 

+ (/3-lX,u;°u;°*} (106) 
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